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Wc present a reformulation of unsteady turbulent flow simulations. The initial condition is relaxed 
and information is allowed to propagate both forward and backward in time. Simulations of chaotic 
dynamical systems with this reformulation can be proven to be well-conditioned time domain bound- 
ary value problems. The reformulation can enable scalable parallel-in-time simulation of turbulent 
flows. 



I. NEED FOR SPACE-TIME PARALLELISM 



The use of computational fluid dynamics (CFD) in science and engineering can be categorized into Analysis 
and Design. A CFD Analysis performs a simulation on a set of manually picked parameter values. The flow 
field is then inspected to gain understanding of the flow physics. Scientific and engineering decisions are then 
made based on understanding of the flow field. Analysis based on high fidelity turbulent flow simulations, 
particular Large Eddy Simulations, is a rapidly growing practice in complex engineering applications^^. 

CFD based Design goes beyond just performing individual simulations, towards sensitivity analysis, op- 
timization, control, uncertainty quantification and data based inference. Design is enabled by Analysis 
capabilities, but often requires more rapid turnaround. For example, an engineer designer or an optimiza- 
tion software needs to perform a series of simulations, modifying the geometry based on previous simulation 
results. Each simulation must complete within at most a few hours in an industrial design environment. 
Most current practices of design use steady state CFD solvers, employing RANS models for turbulent flows. 
Design using high fidelity, unsteady turbulent flow simulations has been investigated in academia '. De- 
spite their great potential, high fidelity design is infeasible in an industrial setting because each simulation 
typically takes days to weeks. 
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FIG. 1. Exponential increase of high performance computing power. Data originate from top500.org. GFLOPS, 
TFLOPS, PFLOPS and EFLOPS represent 10^, 10^^, 10^^ and 10^** FLoating point Operations Per Second, respec- 
tively. 



The inability of performing high fidelity turbulent flow simulations in short turnaround time is a barrier 
to the game-changing technology of high fidelity CFD-based design. Nevertheless, development in High 
Performance Computing (HFC), as shown in Figure 1, promises to delivery in about ten years computing 
hardware a thousand times faster than those available today. This will be achieved through extreme scale 
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parallelization of light weight, communication constrained cores A 2008 study developed a straw man ex- 
treme scale system that could deliver 10^^ FLOPS by combining about 166 million cores '''. CFD simulations 
running on a million cores can be as common in a decade as those running on a thousand cores today. 

Will the projected thousand-fold increase in the number of computing cores lead to a thousand-fold 
decrease in turnaround time of high fidelity turbulent flow simulations? If the answer is yes, then the same 
LES that takes a week to complete on today's systems would only take about 10 minutes in 2022. High 
fidelity CFD-bascd design in an industrial setting would then be a reality. 
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FIG. 2. Typical parallel efficiency of LES in complex geometry, derived from'. The upward triangles, circles, 
downward triangles and squares represent meshes of 2M, IM, 216k and 64k grid points respectively. 

The answer to this question hinges on development of more efficient, scalable and resilient parallel sim- 
ulation paradigm. Current unsteady flow solvers are typically parallel only in space. Each core handles 
the flow field in a spatial subdomain. All cores advance simultaneously in time, and data is transferred 
at subdomain boundaries at each time step. This current simulation paradigm can achieve good parallel 
scaling when the number of grid points per core is large. However, Figure 2 shows that parallel efficiency 
quickly deteriorates as the number of grid points per core decreases below a few thousand. This limit is due 
to limited inter-core data transfer rate, a bottleneck expected to remain or worsen in the next generation 
HFC hardware. Therefore, it would be inefficient to run a simulation with a few million grid points on a 
million next-generation computing cores. We need not only more powerful computing hardware but also 
a next-generation simulation paradigm in order to dramatically reduce the turnaround time of unsteady 
turbulent flow simulations. 

A key component of this enabling, next-generation simulation paradigm can likely be space-time parallel 
simulations. These simulations subdivide the 4-dimensional space-time computational domain. Each com- 
puting core handles a contiguous subdomain of the simulation space-time. Compared to subdivision only 
in the 3-dimensional space, space-time parallel simulations can achieve signiflcantly higher level of concur- 
rency, and reduce the ratio of inter-core communication to floating point operations. Each core computes 
the solution over a fraction of the entire simulation time window (Fig 3), reducing the simulation turnaround 
time. 
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FIG. 3. Illustration of spatial parallelism (left) and space-time parallelism (right). 



Space-time parallel simulations can significantly reduce the ratio of inter-core data transfer to floating 

point operations. This ratio can be estimated from the fraction of grid points lying on subdomain interfaces. 

_i 

The fraction of interfacial grid points in a spatially parallel simulation is estimated to be ^ 6 {M/N) ^ , 
where M is the total number of grid points assumed to be uniformly distributed in a cubical domain, and N 
is the number of computing cores. The fraction of space-time grid points in a space-time parallel simulation 
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is estimated to be 8 {MT/N)~^ , where T is the total number of time steps. Table I shows that typical 
space-time parallel simulations on a million cores have significantly smaller ratio of inter-core data transfer 
to floating point operations than equivalent spatially parallel simulations. If a space-time parallel simulation 
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TABLE I. For typical simulations of M grid points and T time steps running on A'^ = 10® cores, this table estimates 
the grid points per core of a spatially parallel simulation rriso = M/N , the space-time grid points per core of a space- 
time parallel simulation nist = MT/N, and the equivalent grid points per core of a spatially parallel simulation with 

— i ' _ i 

the same fraction of interfacial grid points as the space-time parallel simulation Grhso^ ~ Sm^j* . 

achieves the same parallel efficiency of a spatially parallel simulation for the same fraction of interfacial grid 
points, Table I and Fig 2 suggest that a typical million-grid-point turbulent flow simulation ca run efficiently 
on a million cores with good parallel efficiency. 

By reducing the ratio of inter-core communication to floating point operations, space-time parallel simula- 
tions has the potential of achieving high parallel efficiency, even for relatively small turbulent flow simulations 
on extreme scale parallel machines. Combined with next generation computing hardware, this could lead 
to typical simulation turnaround time of minutes. Space-time parallelism could enable high fidelity CFD- 
based design, including sensitivity analysis, optimization, control, uncertainty quantification and data-based 
inference. 



II. BARRIER TO EFFICIENT TIME PARALLELISM 

High fidelity turbulent flow simulation and design with fast turnaround time can be enabled on extreme 
scale parallel computers through space-time parallelism. Such simulation paradigm requires efficient, scalable 
decomposition both in space and in time. Although scalable spatial domain decomposition for turbulent 
flow simulations has seen many applications ' , scalable time domain decomposition for such simulations is 
still at the frontier of research. 

Time domain decomposition methods have a long history^ . Popular methods include the multiple shooting 
method, time-parallel and space-time multigrid method, and the Parareal method. As exemplified in Fig. 4, 




FIG. 4. Illustration of Parareal extracted from reference '. G and F are the coarse and fine solvers, respectively. 

most time domain parallel methods divides the simulation time interval into small time chunks. They start 
with an initial estimate obtained by a coarse solver. Iterations are then performed over the entire solution 
history, aiming to converge to the solution of the initial value problem. In particular, the Parareal method 
has been demonstrated to converge for large scale turbulent plasma simulations". 

However, many time parallel methods suffer from a common scalability barrier in the presence of chaotic 
dynamics. The number of required iterations increases as the length of the time domain increases. As 
demonstrated for both the Lorenz attractor^" and a turbulent plasma simulation ', the number of iterations 
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for reaching a given tolerance is often proportional to the length of the time domain. Because the the number 
of operations per iteration is also proportional to the length of the time domain, the overall computation 
cost scales with the square of the time domain length. 

This poor scalability is related to the characteristic sensitivity of chaos. A small perturbation to a chaotic 
dynamical system, like the turbulent flow shown in Fig 5, can cause a large change in the state of the system 
at a later time. This sensitivity causes a significant barrier to fast convergence of time domain parallel 




FIG. 5. Spanwise velocity difference between two unsteady flow solutions at t = 2, t — 22 and t = 62, shown in the 
upper, middle and lower plots, respectively. A 10^^ magnitude perturbation at t = is the only difference between 
the two solutions. Reo ~ 500; periodic spanwise extent of AD is used. 
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FIG. 6. Error in 9 Parareal iterations of the Lorenz attractor^". Horizontal axis represent time chunks; each time 
chunk has length of 0.1. 
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FIG. 7. Error in 12 Parareal iterations of a turbulent plasma simulation. Figure is extracted from reference'*. 
Horizontal axis represent time chunks. The solid red line represents the first iteration; the yellow dashed line at the 
lower right corner represents the 12th iteration. 



methods. In the Parareal method, for example, a small difference between the coarse and fine solvers in the 
early time chunks can cause a large difference in the initial estimate and the converged solution. A small 
correction made in the earlier time chunks can result in a large update in later time chunks. As a result, 
the later time chunks can only converge after the earlier time chunks converge (Figs 6 and 7). The number 
of iterations required to converge the entire solution below a certain tolerance is therefore proportional to 
the length of the time domain. 
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The cause of this poor scalabihty, the sensitivity of chaos, can be quantified by the Lyapunov exponent. 
Consider two otherwise identical simulations with an infinitesimal difference in their initial condition. If the 
simulated system is chaotic, then the difference between these two simulations would grow as exp At. This 
A is the maximal Lyapunov exponent, often just called as the Lyapunov exponent. Mathematically, for a 
dynamical system with an evolution function 

A=limlimilog&±i^^lf« 

almost surely for any v and u on the attractor. 

By describing how much a small perturbation changes the solution at a later time, the Lyapunov exponent 
A determines the convergence behavior of time domain parallel method. A small error of magnitude e at 
t = in a Parareal coarse solver can cause an error of size ~ e e^* at a later time t. A small update of size 
e at an earlier time ti can require an update of size ~ e e^*-*^"*!-* at a later time ti. It is no surprise that a 
larger Lyapunov exponent A poses a greater challenge to time parallel methods like Parareal. 

The influence of the Lyapunov exponent A on the convergence of Parareal is particularly significant in 
chaotic dynamical systems that has multiple time scales. The maximal Lyapunov exponent, with a unit 
of time~^, is often inversely proportional to the smallest time scale. Consequently, time domain parallel 
methods are particularly challenged by chaotic multiscale simulations in which very small time scales are 
resolved. 




FIG. 8. A solution of the multiscale coupled Lorenz system (1) with c — 1/4. The thick, red line indicates the slow 
variable Zs. The thin, blue line indicates the fast variable Zf. 
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FIG. 9. Error in 9 Parareal iterations of the coupled Lorenz system (1) for c = 1/2 (upper) and c = 1/4 (lower). 
The horizontal axes represent time; each time chunk has length of 0.1. The coarse solver is forward Euler of step 
size 0.004 x c; the fine solver is fifth order Runge Kutta with adaptive time stepping. 



A simple example of such chaotic systems with multiple time scales is the coupled Lorenz system 

Vs = Xs{r ~ Zg) ~ ys, cyf ^ Xf{r - Zf) - yf 
Zs = XsVs - I3{zs + Zf), czf ^ Xfyf - /3(zs + Zf) 



(1) 



Fig. 8 shows that the coefficient c < 1 determines the time scale of the fast dynamics, whereas the slower 
dynamics has a time scale of about I. When c decreases, the maximal Lyapunov exponent A increases. 
Figure 9 shows that Parareal converges proportionally slower for smaller c. 
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These pieces of evidence suggest that time domain parallel methods such as Parareal can suffer from lack 
of scalability in simulating chaotic multiscale systems, e.g., turbulent flows. The length of the time domain 
in a turbulent flow simulation is often multiples of the slowest time scale, so that converged statistics can be 
obtained. The finest resolved time scale that determines the maximal Lyapunov exponent can be orders of 
magnitude smaller than the time domain length. Consequently, the required number of time domain parallel 
iterations can be large. Increasing the time domain length or the resolved time scales would further increase 
the required number of iterations. 

III. REFORMULATING TURBULENT FLOW SIMULATION 

Time domain parallel methods, such as Parareal, scale poorly in simulation of chaotic, multiscale dynam- 
ical systems, e.g. turbulent flows. This poor scalability is because the initial value problem of a chaotic 
dynamical system is ill-conditioned. A perturbation of magnitude e at the beginning of the simulation 
can cause a difference of e exp At at time t later, where A is the maximal Lyapunov exponent. The condition 
number of the initial value problem can be estimated to be 

T 

K ^ exp AT ^ exp — , 

T 

where T is the time domain length and t is the smallest resolved chaotic time scale. This condition number 
can be astronomically large for multiscale simulations such as DNS and LES of turbulent flows. Efficient 
time domain parallelism can only be achieved through reformulating turbulent flow simulation into a well- 
conditioned problem. 

We reformulate turbulent flow simulation into a well-conditioned problem by relaxing the initial condition. 
Not strictly enforcing a initial condition is justiflcd under the following two assumptions: 

1. Interest in statistics: all quantities of interest in the simulation are statistics of the flow fleld in 
quasi-equilibrium steady state. These include the mean, variance, correlation, high order moments and 
distributions of the flow fleld. 

2. Ergodicity: starting from any initial condition, the flow will reach the same quasi-equilibrium steady 
state after initial transient. All statistics of the flow fleld are independent of the initial condition after 
reaching the quasi-equilibrium steady state. 

Under these assumptions, satisfying a particular initial condition is not important to computing the quan- 
tities of interest. Instead of trying to flnd the flow solution that satisfies both the governing equation and 
the initial condition, we aim to flnd a flow solution satisfying only the governing equation. For example, we 
can formulate the problem as flnding the solution of the governing equation that is closest (in sense) to 
a reference solution, which can come from a coarse solver or from solution at a different parameter value. 

Relaxing the initial condition annihilates the ill-conditioning in simulating chaotic dynamical systems, 
making time parallelization efficient and scalable. If the initial condition were fixed, a small perturbation 
near the beginning of the simulation would cause a large change in the solution later on. If the initial 
condition were relaxed, a small perturbation near the beginning of the simulation could be accommodated 
by a small change in the initial condition, with little effect on the the solution after the perturbation. 

The following geometric analysis demonstrates the stability of a chaotic dynamical system with relaxed 
initial condition. The phase space around a trajectory can be decomposed into a stable manifold and an 
unstable manifold. Consider an arbitrary impulse perturbation to the system as shown in Fig 10. Because 
the perturbation can contain a component along the unstable manifold, the perturbed trajectory would 
exponentially diverge from the unperturbed trajectory if the initial condition were fixed. However, If the 
initial condition is relaxed, the perturbation along the unstable manifold can be annihilated by slightly 
adjusting the initial condition, as shown in Fig 10. This small adjustment in the initial condition makes it 
sufficient to only consider the decaying effect of the perturbation along the stable manifold. The effect of the 
perturbation along the unstable manifold is reflected by the small trajectory change before the perturbation. 

Stability of the trajectory with a relaxed initial condition is achieved by splitting a perturbation into 
stable and unstable components, and propagating their effects forward and backward in time, respectively. 
This stability property is formally known as the Shadowing Lemma^ '. For any S > there exists e > 0, 
such that for every Ur that satisfies \\dur/dT — TZur\\ < e , < i < T , there exists a true solution u and a 
time transformation tij), such that \\u(t) — Ur{T)\\ < 5, |1 — dt/dT\ < S, and du/dt — TZu = , < r < T. 
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FIG. 10. A geometric illustration of the response of a chaotic dynamical system to an impulse perturbation. The black 
arrow represents an unperturbed trajectory running along the z-axis. The red arrows represent the perturbation. 
The magenta dashed line represents the perturbed trajectory with fixed initial condition. The blue line represents a 
perturbed trajectory with relaxed initial condition. 



The shadowing lemma theorizes the trajectory stability of chaotic dynamical systems with relaxed initial 
conditions. This paper develops the following least squares formulation designed to take advantage of this 
stability numerically 



1 

u,ri = argmin - / ((u — Ur, u — Ur) + rf) dt 

2 Jo (2) 

s.t. (l + ?7)— =7eu, 0<r<T 

OT 

This least squares problem finds the solution of the differential equation du/dt = TZu that is closest to a 
reference solution Ur in some Hilbert space, with a time transformation r satisfying dr/dt = I + rj. 

The least squares problem (2) has a unique and stable solution when the reference solution Ur is close to 
the actual solution u, so that u — Ur can be described by the corresponding linearized least squares problem. 
With a convex, positive definite objective function and a linear constraint, this problem has a unique and 
stable solution. 

The constraint least squares problem (2) can be simplified by forming its Lagrange function 



A 



{u — Ur, U — Ur) + if' 
d ^ \ 

where w is the Lagrange multiplier. Variation of the Lagrange function can be transformed through inte- 
gration by parts 

Z-^. / / du 

SA = 



rx \m 
w, — Lou ) dt 

at / 

(4) 

Su , U — Ur C*w ) dt 



+ f Sri (t]+(^w , ^'j'^dt+{Su,w) 



The solution of the constraint least squares problem (2) satisfies the first order optimality condition 6 A — 
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for all 5u and 5rj. Therefore, Equation (4) leads to 

dw 



aw / du\ 



(5) 



and the boundary condition w{t = 0) = w{t = T) = 0. 

Substituting these equalities back into the constraint in (2), we obtain the following second order boundary 
value problem in time: 



d ( dw 



dt \ at 



+ C*w + u, 



' dw 

n[ ^ + c*w + u, 

at 







w\ „ = w „ = 

Ir— Ir— T 



where the transformed time derivative is 



and the linearized operator is 



SUu 



Su 



(6) 



(7) 



(8) 



L* is the adjoint operator of L. 

Our reformulation of chaotic simulation produces a least squares system (2), whose solution u satisfies 
the governing equation, and has relaxed initial condition. This solution can be found by first solving 
the Lagrange multiplier w through Equation (6), a second order boundary value problem in time, then 
computing u from Equation (5). This reformulated problem does not suffer from ill-conditioning found in 
initial value problems of chaos, therefore overcomes a fundamental barrier to efficient time-parallelism. The 
reformulated boundary value problem in time (6) is well-conditioned and suitable for time domain parallel 
solution methods. These properties are demonstrated mathematically in Section IV and Appendix A, and 
through examples in Sections V and VL 



IV. SOLVING THE REFORMULATED SYSTEM 



Solving the reformulated problem involves two steps: 1. solving a nonlinear boundary- value-problem in 
time (6) for the Lagrange multiplier w^ and 2. evaluating Equation (5) for the solution u. Performing 
the second step in space-time parallel is relatively straightforward. This section introduces our iterative 
algorithm for the first step based on Newton's method. 

1. Decompose the 4D spatial-temporal computational domain into subdomains. 

2. Start at iteration number k ~ 1 with an initial guess u^^'> distributed among computing cores. 

3. Compute the residual du^''~^^ /dt — TZ{u^''~^^) in each subdomain. Terminate if the magnitude of the 
residual meets the convergence criteria. 

4. Solve a linearized version of Equation (6) for w^^^ using a space-time parallel iterative solver 

(detailed follows in the rest of this section). 

5. Compute the time dilation factor rj and transformed time dt = dr exp {—ri{T)). 

6. Compute the updated solution m^'^^ by parallel evaluation of Equation (5) with Ur ~ u'-'"'^^-' and w = w^''\ 

7. Update to transformed time by letting t = t; continue to Step 3 with k ~ k + 1. 

Note that the update in Step 5 avoids degeneracy when ?y < —1, and is first order consistent with dt = 
dr/ (1 + ii{t)) implied by the definition of rj. 

The only step that requires detailed further explanation is Step 4. All other steps can be evaluated in 
space-time parallel with minimal communication across domain boundaries. Solving Equation (6) for w 
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using Newton's method requires a linear approximation to the equation. Ignoring second order terms by 
assuming both u — Ur and 77 are small, Equation (6) becomes 

d'^w ( d „^ „ d 



C* - w + {££* + V)w = fr 



dr \dT dr J ^ ' (9) 



where 7^ is a symmetric positive semi-definite spatial operator 



^- = (->^)^. (10) 



and 



^ = ^-7^K) (11) 

The linearized equation (9) also has a weak form 

a{5w,w) + {{5w) ^0 (12) 

and an energy minimization form, which is the Lagrangian dual of the linearized least squares (2), 

. / a(w,w) A 
w = argmm h I , (lo) 



tu|o,T=0 

The symmetric, positive definite and well-conditioned bilinear form a is 



a{w, v)= r (^ + C*w, ^ + C*v) + {w, Vv) dr (14) 



and linear functional [ is 



1{W) = f {wJr)dT. (15) 

Jo 

The derivation and properties of the weak and energy forms are detailed in Appendix A. These forms of 
Equation (9) makes it possible to achieve provable convergence of iterative solution techniques. 

Wc implemented both a direct solver and a parallel multigrid-in-time iterative solver for the symmetric 
boundary value problem (9). Our iterative method is similar to conventional parallel multigrid methods for 
second order elliptic equations^ ', except for that the "grid" is a series of time steps. When the operator C 
is a large scale spatial operator, the multigrid-in-time method can be extended to a space-time multigrid 
method. 



V. VALIDATION ON THE LORENZ SYSTEM 



The algorithm described in Section IV is applied to the Lorenz system. Both the solution u{t) and the 
Lagrange multiplier w{t) in this case are in the Euclidean space R^. The three components of the solution 
are denoted as u(t) = {x, y, z) by convention. The governing equation is 

u = TZu = (^s{y — x), x{r — z) ~ xy — hz^ , (16) 

where s = 10, r = 35 (the Rayleigh number) and 6 = 8/3 are three parameters of the system. 

The initial guess u*-"-* used in Section IV's algorithm is a numerical solution of the system at a different 
Rayleigh number r = 25. This initial guess has 10,001 time steps uniformly distributed in a time domain of 
length 100. 

Figure 11 shows that the Newton's iterations described in Section IV converges to machine precision within 
9 iterations. The statistical quantity (mean z) of the converged solution also matches that computed from 
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FIG. 11. Solution of the Lorenz system at s,r,b — 10,35,8/3 with the algorithm of Section IV. The initial guess 
u^'^\t),t G [0,100] is a solution of the system at s,r,b = 10,25,8/3. The horizontal axis indicates the iteration 
number k. The dot-dash line with open triangles indicates the norm of the residual /t''''. The red, solid line with 
filled circles represents the time averaged z of the solution The horizontal dashed line indicates the mean z 

computed from an independent initial value solution at r = 35. 




FIG. 12. The black trajectory is a phase space plot of the initial guess ' , a. solution of the Lorenz system at 
s,r,b = 10,25,8/3. The red trajectory plots the converged solution a least squares solution of the Lorenz 

system at s,r,b — 10, 35, 8/3. 



a conventional time integration. The phase space plots in Figure 12 shows that the converged trajectory 
hes on the attractor of the Lorenz system at r = 35, and is significantly different from the initial guess 
trajectory. Figure 13 shows that the converged trajectory closely tracks the initial guess trajectory on a 
transformed time scale. 

Because the reformulated system do not suffer from the ill-conditioning of the initial value problem, 
parallelization in time domain does not encounter the same problem as Parareal method for solving initial 
value problems. Figure 14 shows that our algorithm is scalable up to a time domains of length over 5000. 
In comparison, Parareal would take many more iterations to converge on longer time domains. Converging 
on a time domain of length 5000 would require over 500 Parareal iterations based on estimates from Figure 
6. 
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FIG. 13. The blue, dashed line represents z{t) of the mitial guess, a solution of the system at r = 25. The red, solid 
line represents z{t) of the converged solution at r = 35. 
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FIG. 14. Speed up of the multigrid-in-time solver of the linearized system (9) using up to 128 cores. 4096 time steps 
is allocated per core with uniform time step size of 0.01. The same convergence criterion is used for all cases. 



VI. VALIDATION ON THE KS EQUATION 



The algorithm described in Section IV is applied to the Kuramoto-Sivashinsky equation. Both the solution 
u(x,t) and the Lagrange multiplier w{x,t) in this case are functions in the spatial domain of a; G [0, 100]. 
They also both satisfy the boundary conditions 



du 

£c=o,ioo dx 



a:=0,100 



The governing equation is defined by 



(17) 



where the parameter c is introduced to control the behavior of the system. Figure 15 shows that solutions 
of this initial boundary value problem at c = — 0.1 and c = — 1 reach different chaotic quasi-cquilibrium after 
starting from similar initial conditions. The mean of the solution at equilibrium is higher in the c = — 1 case 
than in the c = —0.1 case. 

Section IV's algorithm is tested for the Kuramoto-Sivashinsky equation at c = —1. The initial guess 
is a numerical solution of the Kuramoto-Sivashinsky equation at c = —0.1. This initial guess has 401 time 
steps uniformly distributed in a time domain of length 100. Figure 16 shows that the Newton's iterations of 
Section IV converges on the Kuramoto-Sivashinsky equation to machine precision within about 10 iterations. 
The statistical quantity (mean u) of the converged solution is close to that computed from a time integration. 
The small discrepancy between them may be caused by the difference between the infinite time statistical 
average and the finite time average over the time domain length of about 120. Figure 17 shows that the 
converged solution shadows the initial condition on a stretched time scale. 
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FIG. 16. Solution of the Kuramoto-Sivashinsky (KS) equation (17) at c = —1 with the algorithm of Section IV. 
The initial guess u^'^\x,t),x £ [0, 100], f £ [0,50] is a solution of the KS equation at c = —0.1. The horizontal axis 
indicates the iteration number k. The dot-dash line with open triangles indicates the norm of the residual fi-''^ The 
red, solid line with filled circles represents the solution u'^''\x,t) averaged over x and t. The horizontal dashed line 
indicates the averaged u{x,t) computed from an independent initial value problem at c = —1. 



VII. CONCLUSION 



We used the ergodic hypothesis to relax the initial condition in simulating chaotic, unsteady dynamical 
systems. The system with relaxed initial condition do not suffer from ill-conditioning encountered in fixed- 
initial condition problems. Consequently, efficient parallcl-in-time simulation can be performed without 
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FIG. 17. The initial guess ' and the converged solution u'-^^^ in Section IV's algorithm applied to the Kuramoto- 
Sivashinsky equation (17) at c = —1. 



scalability problems from which time parallelization of initial value problem suffers. The relaxed- initial- 
condition system is formulated into a least squares problem, whose solution can be obtained through a 
second order boundary value problem in time. We solve this boundary value problem using an iterative 
solution algorithm based on Newton's method. All steps in the algorithm can be efficiently performed in 
time-parallel. 

This methodology is demonstrated on simulations of the Lorenz system and the Kuramoto-Sivashinsky 
equation. The iterative algorithm converges in both cases to machine precision solutions within 10 iterations. 
The statistical quantities of the converged solutions with relaxed initial conditions match those computed 
from traditional time integration methods. Time parallel scalability of this method is demonstrated on the 
Lorenz attractor. 
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Appendix A: Well conditioned 



By taking the inner product of a test function v with both sides of Equation (9), integrating over the time 
domain [0,T], and using integration by parts, we obtain 

1^ + C*w, ^ + C*v'j + {w, Pv) - (v, U) dr^O, (Al) 

which is the weak from (12). We decompose the sohition w into the adjoint Lyapunov eigenvectors in order 
to show that the symmetric bilinear form a{u, v) as in Equation (14) is positive definite and well-conditioned. 
The Lyapunov decomposition for an 7i-dimensional dynamical systems (including discretized PDEs) is 

wit) = Mt)Mt) : where + C*d^, = A,0, (A2) 

i—l 

Each Wi is a real valued function of t, and 0i(t) is the corresponding adjoint Lyapunov eigenvectors. By 
convention, the Lyapunov exponents are sorted such that Ai > A^+i. Combining the Lyapunov eigenvector 
decomposition (A2) results in 



dr ^\ dt 

i—l ^ 

By substituting this equality and Equations (10) into the bilinear form and using the positive semi- 
definiteness of T', we obtain 

a{w, w) > 



The eigenvalues of the matrix I {(pi , 0j ) ) are bounded away from both zero and infinity for uniformly 

V / i,j—l,...,n 

hyperbolic dynamical systems, i.e., there exists < c < C < oo such that 

n n n 

i=l j,i=l i=l 

for any xi, . . . ,Xn- By applying these bounds to (w, w) and to Equation (A3), we obtain 



{w,w)<Cj2 f 



2 

> Wi dt 



(A5) 

dt 
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Most high dimensional, ergodic chaotic systems of practical interest are often assumed to be quasi-hyperbolic, 
whose global properties are not affected by their non-hyperbolic nature. Therefore we conjecture that (A5) 
also holds for these quasi-hyperbolic systems when T is sufficiently large. Because of the boundary conditions 

w{Q) = w{T) = 0, 



-I- XiWi ) dt 



=0 >0 



dwi 



Also because of the boundary conditions, each Wi(t) admits a Fourier sine series 

mnt rmr mnt 



thus, 



The Parscval identity applies to both orthogonal scries, leading to the Poincare inequality 



2 2 f'^ 



By combining this inequality with the inequalities (A5) and (A6), we obtain 



(A6) 



^— \ lllti L IIL/i lllli L 

^H^) = ^™ ~Y~ ' ^ ~ ~Y~ ' ^ ' 

rn—l 



dwi \ ^ ^ rmr rmrt , , 

- 2^ — cos — . (A8) 

m— 1 



T 

>(^/Tf (A9) 

/7r\2 T ^ „ /7r\2 /-^ 

> 



aK^) ^ / Try ^^^^^ 



{w,w) \TI C 

This inequality leads to our conclusion that the symmetric bilinear form 0(111, v) is positive definite. Equation 
(9) is well-conditioned because if the weak form (12) holds and o(w, w) = ^{w), then 

\\i\\\\w\\>\{w)=a{wM>{^)' ^WMW (All) 

therefore, 

s<n'-- (A12) 

||[|| \t) c ^ ' 

This inequality bounds the magnitude of the solution by the magnitude of the perturbation. It also bounds 
the magnitude of the solution error by the magnitude of the residual. This condition number bound is C/c 
times that of the Poisson equation in a ID domain [0,r]. 



